Human Parsing


Human parsing is the process of identifying, segmenting, and categorizing different parts of a human body in an image or video such as head, shoulders, knees, and toes.

A Scalable Pipeline for Estimating Verb Frame Frequencies Using Large Language Models

Add code
Jul 29, 2025
Viaarxiv icon

T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation

Add code
Jul 28, 2025
Viaarxiv icon

Text-to-SPARQL Goes Beyond English: Multilingual Question Answering Over Knowledge Graphs through Human-Inspired Reasoning

Add code
Jul 22, 2025
Viaarxiv icon

What's in the Box? Reasoning about Unseen Objects from Multimodal Cues

Add code
Jun 17, 2025
Viaarxiv icon

Benchmarking Multimodal LLMs on Recognition and Understanding over Chemical Tables

Add code
Jun 13, 2025
Viaarxiv icon

GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition

Add code
Jun 09, 2025
Viaarxiv icon

InteractAnything: Zero-shot Human Object Interaction Synthesis via LLM Feedback and Object Affordance Parsing

Add code
May 30, 2025
Viaarxiv icon

PhysLab: A Benchmark Dataset for Multi-Granularity Visual Parsing of Physics Experiments

Add code
Jun 07, 2025
Viaarxiv icon

BodyGPS: Anatomical Positioning System

Add code
May 12, 2025
Viaarxiv icon

Chain-of-Talkers (CoTalk): Fast Human Annotation of Dense Image Captions

Add code
May 28, 2025
Viaarxiv icon